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EXECUTIVE  SUMMARY 


There  is  a  need  for  rapid  predietion  of  the  physieo-chemieal  properties  of 
ehemieal  warfare  agents  (CWAs)  and  toxie  industrial  ehemieals  (TICs)  on  environmentally 
relevant  materials,  personal  proteetive  equipment,  and  human  tissue.  While  in  the  past  it  was 
possible  to  eoneentrate  laboratory  eharaeterization  efforts  on  a  limited  number  of  known, 
traditional  CWAs  and  TICs,  there  is  the  possibility  that  state  and  non-state  aetors  may  use  CWAs 
outside  of  the  traditional  CWAs  that  have  distinetly  different  physieal  and  ehemieal  properties. 
Rapid,  reliable  hazard  assessments  for  the  persistenee  and  spread  of  non-traditional  agents  may 
be  neeessary  for  the  benefit  of  first  responders  and  elean  up  teams  before  laboratory 
measurements  ean  be  done.  Predietive  tools  also  serve  as  sereening  tools  that  help  identify 
eompounds  that  may  be  partieularly  diffieult  to  deeontaminate.  It  is  the  objeetive  of  this  report 
to  survey  the  reliability  and  error  spread  from  several  available  in-silieo  tools  for  a  set  of 
physieo-ohemieal  properties  that  impaet  predietion  of  environmental  fate  of  a  set  of  traditional 
CWAs  and  simulants.  Exeept  for  ADE  COSMO-RS,  the  tools  are  based  on  quantitative 
strueture-aetivity/property  relationship  (QSAR/QSPR)  methods  using  moleeular  fragments 
(group-eontribution)  approaehes,  whieh  make  predietions  based  on  a  regression  of  laboratory 
measurements  performed  on  similar  ehemieals  and  the  underlying  statistieal  eorrelations  that 
deseribe  the  property  variations  resulting  from  speeifie  groups  of  atoms  within  eaeh  eompound. 

EPI  Suite,  ACD  Labs,  Marvin,  Vega,  and  COSMO-RS  were  used  to  prediet 
properties  sueh  as  the  boiling  point,  vapor  pressure,  log  of  the  water/oetanol  partitioning 
ooeffieient  (Kow),  water  solubility,  and  pKa.  Eor  boiling  point,  both  ACD  Labs  and  EPI  Suite 
were  aeeurate  to  within  20“  C  and  29°  C.  EPI  Suite,  ACD  Labs,  and  ADE  COSMO-RS 
performed  quite  well  for  vapor  pressure  predietions,  exeept  that  ACD  Labs  eould  not  generate 
predietions  for  vapor  pressures  of  less  than  0.1  Pa.  Eor  K^w,  EPI  Suite,  ACD  Labs,  ChemAxon’s 
Marvin,  Vega,  and  ADE  COSMO-RS  were  evaluated,  and  exeept  for  COSMO-RS,  the  differenee 
between  estimation  values  and  measurement  were  less  than  one  log  unit.  With  respeet  to  water 
solubility,  EPI  Suite’s  Kow  estimation  method  of  solubility  yielded  the  smallest  average 
differenee  of  0.87  log  units  between  experiment  and  measurement,  although  ACD  Labs  eould 
give  predietions  over  a  range  of  temperatures  and  pH.  The  greatest  differenee  between 
predietion  and  experimental  measurement  oeeurred  for  ADE  COSMO-RS  presumably  beeause, 
although  it  is  partially  based  on  aeeurate  Density  Eunetional  Theory  (DET)  ealeulations,  its 
overall  predietion  employs  an  empirieal  fit  based  on  an  insuffieiently  small  training  set  size  of 
only  642  eompounds.  Eor  pKa  estimations,  experimental  data  for  only  three  of  the  eompounds 
examined  was  readily  available  in  the  literature,  and  so  the  aeeuraey  of  ACD  Labs  and 
ChemAxon’s  Marvin  towards  the  traditional  agents  eould  not  be  adequately  evaluated. 

Based  on  the  brief  survey  of  estimation  methods,  both  EPI  Suite  and  ACD  Labs 
gave  exeellent  results  for  boiling  point  and  vapor  pressure.  Eor  Kow  estimations,  EPI  Suite,  ACD 
Labs,  Marvin,  and  Vega  gave  estimations  that  were  reasonable  and  on  average  less  than  one  log 
unit  off  the  published  measurement.  Larger  errors  were  eneountered  with  water  solubility  and 
pKa  estimations.  We  reason  that  two  issues  eontribute  to  the  differenee  between  experimental 
measurements  and  estimations  from  fragment  based  methods.  Eirst,  as  would  be  expeeted, 
eompounds  eontaining  elements  or  funetional  groups  outside  of  the  method’s  training  set 
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contributed  to  the  average  error.  The  seeond  issue  that  seems  apparent  in  examination  of  the 
data  is  that  molecules  that  tend  to  produce  the  largest  differences  between  model  predietion  and 
experimental  measurement  have  molecular  symmetry.  For  properties  highly  dependent  on 
molecular  structure  and  polarity,  such  as  water  solubility,  a  fragment  based  method  ean 
contribute  signifieantly  to  the  error,  sinee  the  fragment  eontributions  are  treated  additively.  It  is 
possible  symmetry  may  cancel  out  the  contributions  from  a  given  fragment,  so  that  a  property  is 
overestimated.  We  reeommend  eaution  with  respect  to  estimations  from  fragment  based 
methods  for  molecules  that  possess  symmetry,  or  possess  unusual  functional  groups  or  atomic 
linkages  (e.g.  N-P  bond).  We  expeet  that  estimation  methods  based  on  deseriptors  of  the  entire 
moleeule  rather  than  fragments  should  be  more  robust  with  respeet  to  symmetry  and  funetional 
groups  outside  of  the  method  training  set  of  data. 


Examples  of  Property  Predietions  That  Differ  the  Most  From  Experiment 


Water  Solubility  (mg/L) 

EPI  Kow 

EPI  Frag 

ACD 

EXP 

HNl  (nitrogen  mustard) 

4.0E+04 

7.3E+03 

1.5E+04 

1.6E+02 

diisopropyl  methyl  phosphonate 

7.3E+03 

2.2E+05 

3.4E+04 

1.5E+03 

GA 

3.2E+04 

l.OE+06 

3.5E+05 

9.8E+04 

GB 

4.6E+04 

l.OE+06 

4.2E+05 

l.OE+06 

GD 

1.6E+03 

3.4E+05 

5.0E+04 

2.1E+04 

GF 

2.1E+03 

5.4E+05 

6.2E+04 

3.7E+03 

LI 

2.6E+02 

4.7E+03 

n/a 

5.0E+02 

VX 

3.2E+03 

9.1E+04 

4.8E+03 

3.0E+04 

disperse  red-9 

6.8E-01 

8.6E+00 

4.5E-04 

1.2E-01 

pKow 

EPI 

ACD 

MARVIN 

Vega 

EXP 

HD  (sulfur) 

2.4 

2.1 

2.0 

3.2 

1.37 

VG 

1.7 

2.9 

1.8 

1.7 

1.7 

disperse  red-9 

4.1 

3.0 

3.0 

3.0 

4.1 

pKa 

ACD 

ADF/COSMO 

Marvin 

EXP 

VX 

9.8 

7.9 

10.6 

8.6 

Illustrative  Tables:  Compounds  and  predicted  measurements  highlighted  in  italics  show 
predicted  values  of  water  solubility  (mg/L),  pKow  and  pKa  that  differ  from  experimental 
measurements  by  more  than  an  order  of  magnitude  or  1 .0  pK  units.  These  moleeules  (some 
shown  below)  tend  to  have  a  high  degree  of  symmetry  or  unusual  elements  or  eombinations  of 
atoms. 
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A  COMPARISON  OF  QSAR  BASED  THERMO  AND  WATER  SOEVATION 
PROPERTY  PREDICTION  TOOES  AND  EXPERIMENT AE  DATA  EOR  SEEECTED 
TRADITIONAL  CHEMICAL  WARE  ARE  AGENTS  AND  SIMULANTS 


1.  INTRODUCTION 

1 . 1  The  Need  for  a  Rapid  Predietion  Capability 

There  is  a  need  for  rapid  predietion  of  the  physieo-ehemieal  properties  of 
ehemieal  warfare  agents  (CWAs)  and  toxie  industrial  ehemieals  (TICs)  on 
environmentally  relevant  materials,  personal  protective  equipment,  and  human  tissue. 
While  in  the  past  it  was  possible  to  concentrate  laboratory  characterization  efforts  on  a 
limited  number  of  known,  traditional  CWAs  and  TICs,  there  is  the  possibility  that  state  and 
non-state  actors  may  use  CWAs  outside  of  the  traditional  CWAs  that  have  distinctly 
different  physical  and  chemical  properties.  Due  to  the  toxicity  of  many  potential 
compounds  not  within  the  list  of  traditional  agents,  and  because  of  the  difference  in 
behavior  on  different  environmental  media,  it  is  not  feasible  to  perform  laboratory 
measurements  of  all  compounds  of  interest  on  all  possible  environmental  media.  ^  A 
similar  issue  confronts  government  regulatory  agencies  such  as  the  Environmental 
Protection  Agency  (EPA),  where  the  vast  number  of  compounds  produced  by  industry 
exceeds  their  laboratory  capacity  to  characterize  every  possible  compound.  However, 
predictive  tools  help  prioritize  compounds  of  interest  and  target  compounds  that  may  have 
properties  that  contribute  to  persistence  in  the  environment  or  properties  that  impede 
decontamination.  Eirst  responders  and  clean-up  teams  may  require  rapid,  reliable 
estimations  of  contamination  area  and  penetration  into  materials  before  any  laboratory 
property  measurements  can  be  done,  as  well. 

The  CWA  physico-chemical  properties  contribute  to  complex  but  critical 
processes  such  as  environmental  fate,"^'^*’  pathways  into  the  body,^’  as  well  as  the  innate 
toxicity  of  the  compound,  and  all  of  these  factors  contribute  to  the  overall  threat.  Some 
examples  of  physico-chemical  properties  of  importance^^  include  solubility  in  water,'"^'^^ 
ionizability  in  water  (pKa),  '  vapor  pressure,  '  boiling  point,  and  partitioning 
coefficients,^  such  as  K^w,  the  partitioning  coefficient  between  octanol  and  water. 
Solubility  in  water  can  affect  environmental  transport  of  a  given  CWA,  and  whether  the 
compound  can  undergo  degradation  by  hydrolysis.  Ionizability  in  water  is  related,  and  also 
contributes  to  whether  a  compound  can  degrade  in  the  environment.  Vapor  pressure  and 
boiling  point  affect  persistence.  Sarin  has  a  relatively  high  vapor  pressure  and  lower 
boiling  point,  and  tends  to  evaporate  from  an  affected  area  within  hours.  In  contrast,  VX 
and  sulfur  mustard  have  low  vapor  pressures  and  can  persist  for  long  periods  of  time.  In 
addition,  low  vapor  pressure  increases  the  difficulty  of  detection.  Kow  has  been  known  to 
be  an  indicator  for  the  pathway  into  the  body,  where  a  high  value  shows  a  cutaneous  threat. 
Such  properties  are  also  indicators  of  whether  the  compound  penetrates  personal  protective 
equipment.  Eurthur  complicating  matters,  the  interaction  of  CWAs  and  common  materials 
in  the  environment  such  as  concrete,  sand,  and  soil,  affect  the  fate  of  the  CWAs  and 
whether  the  threat  from  that  agent  persists  over  time.  As  a  result,  it  is  not  feasible  to 
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experimentally  measure  all  possible  eombinations  of  threatening  materials  and 
environmental  substrates,  and  there  is  an  aeute  need  for  predietive  in-silieo  tools.  In  other 
words,  reasonably  aeeurate  in-silieopredietive  tools  ean  augment  eost  limited  laboratory 
resourees  by  identifying  the  eompounds  most  likely  to  have  threatening  properties. 

1 .2  Inputs  for  Tools  for  Predieting  Environmental  Fate 

Beeause  of  the  need  for  in-silieo  tools  that  prediet  environmental  fate  of 
toxie  eompounds,  the  foeus  of  this  study  is  to  examine  a  number  of  existing  tools  that 
prediet  physieo-ehemieal  properties.  The  property  predietion  in  turn  ean  be  used  as  inputs 
into  larger  seale  models  that  prediet  environmental  fate.  There  are  a  number  of  methods 
and  models  for  predieting  environmental  fate  of  toxic  chemicals,  particularly  pesticides. 
Because  of  the  similarity  of  many  CWAs  to  common  pesticides,  it  is  possible  to  leverage 
these  tools.  Modeling  tools  such  as  PEARL  and  HYDRUS,  make  predictions  of 
environmental  transport  and  degradation  using  the  physico-chemical  properties  of  a  target 
compound  as  inputs.  These  tools  predict  the  persistence  of  a  compound  in  the 
environment,  whether  it  can  contaminate  ground  water,  or  its  behavior  in  various  types  of 
soil,  etc.  Such  predictions  are  essential  for  assessing  the  long  term  threat,  but  these  tools 
rely  on  accurate  predictions  of  physico-chemical  properties  to  obtain  reliable 
environmental  fate  predictions. 

A  number  of  Quantitative  Structure-Activity  Relationship  (QSAR)  based 
software  tools  currently  exist,  such  as  EPI  Suite,  VEGA  (a  component  of  CAESAR),  ACD 
Labs  Suite,  ChemAxon  MARVIN,  and  SPARC,  that  utilize  geometric/functional  group- 
contribution  descriptors  to  predict  physical  properties  that  feed  into  the  fate  models.  Some 
are  freely  available,  such  as  EPI  Suite,  while  others  are  commercially  licensed.  Given  the 
variety  of  predictive  tools,  an  assessment  of  how  these  tools  perform  compared  to 
experimental  data  is  desirable. 

1.3  Objective 

It  is  the  objective  of  this  report  to  survey  the  reliability  and  error  spread 
from  several  available  in-silieo  tools  for  a  set  of  physico-chemical  properties  that  impact 
prediction  of  environmental  fate.  We  specifically  hope  to  examine  performance  against  a 
set  of  traditional  CWAs  and  simulants.  Because  the  environmental  fates  of  other  organic 
compounds  are  also  relevant,  such  as  industrial  dyes  and  pharmaceuticals,  we  include 
Disperse  Red  9  and  cocaine.  Some  of  the  tools,  as  mentioned  above  are  based  on  fragment 
based,  that  is,  group-contribution  QSAR  approach,  which  makes  predictions  based  on  a 
regression  of  laboratory  measurements  performed  on  similar  chemicals.  This  QSAR 
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approach  has  been  well  established  for  predicting  general  physico-chemical  properties, 
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especially  in  the  pharmaceutical  industry.  A  method  exists  that  makes  predictions  from  a 
descriptor  calculated  from  density  functional  theory  (DFT),  which  is  an  electronic  structure 
method,  and  we  include  results  from  the  COnductor-ltke  Screening  MOdel  for  Real 
Solvents  '  (COSMO-RS)  for  comparison.  We  do  not  intend  to  fully  analyze  the  results 
of  COSMO-RS  since  such  a  detailed  examination  of  the  electron  density  around  the 
molecule  is  outside  the  scope  of  this  study.  Although  typical  performance  studies  involve 
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hundreds  or  thousands  of  compounds,  we  wish  to  limit  the  scope  to  a  set  of  traditional 
CWAs  and  simulants,  and  perhaps  provide  a  guide  for  usage  of  existing  predictive  tools  in 
the  study  of  compounds  related  to  traditional  CWAs. 


2.  BACKGROUND 

2. 1  Overview  of  QSAR  Methods  Used 

Four  of  the  physico-chemical  predicting  tools  examined  in  this  report  are 
QSAR  based  tools  and  a  DFT  electronic  structure  based  tool.  QSARs  are  a  well 
established  method  of  property  prediction  first  demonstrated  for  petroleum  components. 
QSARs  are  simple  mathematical  regression  models  of  the  form 

Ypred  =  Co  +  gAi  -f  C2X2  +  •■•  +  Ci_2Af  -f  Equation  (1) 

where  Yp^ed  is  the  predicted  property,  co  is  a  constant,  ci  to  c„  are  coefficients  from  the 
regression  to  the  training  set  of  measurements,  Xi  to  X„  represent  molecular  or  fragment  or 
field-based  descriptors,  and  the  final  term  in  Equation  1  represent  higher  order  terms.  The 
descriptors  are  some  property  or  characteristic  of  the  molecular  structure.  Table  1  shows 
some  of  the  classes  of  descriptors  as  well  as  some  examples  of  those  descriptors.  The 
training  set  is  a  subset  of  compounds  that  have  had  the  property  of  interest  measured  in  the 
laboratory.  A  regression  fit  is  performed  to  relate  the  laboratory  measurements  of  these 
compounds  to  the  coefficients  in  the  model.  Validation  and  error  assessment  of  the  QSAR 
model  is  performed  with  the  remaining  laboratory  measurements  that  were  not  included  in 
the  original  training  set.  The  model  is  generally  valid  for  chemical  compounds  that  are 
similar  to  those  used  in  the  training  set. 

The  properties  examined  in  the  present  study  include  1)  Boiling  Point  at  1 
atm  (101325  Pa),  2)  Vapor  Pressure  at  standard  ambient  temperature  (25°  C),  3)  solubility 
in  water  (mg/E),  4)  -log(acid/base  dissociation  constant)  in  water  (pKa),  and  5)  the  - 
log(octano  1/water  partitioning  coefficient)  (Kow).  Eor  properties  1  to  3,  EPI  Suite  version 
4.11,  and  ACD/Labs/PhysChem  12.0  were  used.  Eor  property  4  (pKa),  ACD  Eabs,  and 
ChemAxon’s  MARVIN  pKa  and  Kow  calculators  were  used  to  make  predictions.  Eor 
property  5,  the  Kow,  predictions  were  available  from  EPI  Suite,  ACD/Eabs,  MARVIN,  and 
VEGA.  These  estimation  models  are  based  on  group-contribution  methods.  An  additional 
method  called  ADE  COSMO-RS  that  is  based  on  DET  methods  was  also  used  for 
comparison.  Certain  molecular  fragments  or  functional  groups  tend  to  add  to  the 
magnitude  of  given  properties.  More  frequent  occurrences  of  these  groups  result  in  greater 
magnitudes  for  those  properties,  and  some  of  the  methods  include  correction  factors  that 
are  summed  into  the  property  value  based  on  the  occurrence  of  certain  atoms  or  other 
functional  groups.  Eor  example,  with  pKa  estimations,  certain  functional  groups  are 
known  to  be  ionizable,  such  as  amines.  Occurances  of  these  functional  groups  are  known 
to  be  proportional  to  the  experimental  pKa,  and  as  a  result,  these  can  be  used  to  generate  a 
knowledgeable  guess. 
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Table  1:  Molecular  Descriptor  Classes  and  Examples 


Constitutional 

Electronic/geometric 

Physico¬ 

chemical 

Eragment/Structure 

Topological 

#  H-bonds 

dipole  moment 

lipophilicity 

functional  groups 

atomic 

branching 

Hammett 

constants 

molecular  volume 

polarizability 

bonds  to 
atom 

#  double 
bonds 

bonds  to  atom 

molecular 
shape  index 

molecular 

weight 

polar  surface 
area 

#  Rings 

electrostatic 

field 

2.1.1  Differences  between  EPI  Suite/Mbbpvpwin  and  ACD  Labs 

The  Mbbpvpwin  component  of  EPI  Suite  utilizes  the  method  of  Stein  & 
Brown^"^  to  estimate  the  boiling  points.  Elsing  the  standard  QSAR  approach,  a  linear  model 
is  used  to  estimate  boiling  point.  Elsing  the  same  41  groups  used  by  Joback  and  Reid, 
an  additional  85  groups  are  added  to  the  method.  It  should  be  noted  that  some  of  the 
additional  groups  are  subgroups  of  the  original  41.  The  training  set  consisted  of  4426 
different  organic  compounds,  with  an  additional  set  of  6584  measurements  for  validation 
of  the  method. 


In  contrast  to  the  method  used  in  EPI  Suite,  the  boiling  point  is  not 
calculated  directly  using  a  QSAR  approach,  but  rather  the  value  of  a  function  K.  The  ACD 
Labs  User’s  Guide  states  that  the  boiling  point  follows  a  nonlinear  form  similar  to  the 
Antoine  equation; 


rii  =  Uo 


+ 


bp-a2 


Equation  (2) 


where  m  corresponds  to  the  number  of  occurrences  of  group  i,  bp  corresponds  to  the 
boiling  point,  ao,  ai,  and  a2  are  empirically  determined  constants.  However,  they 
determined  that  the  value  of  K,  a  function  of  the  molecular  volume  (MV)  and  the  boiling 
point  (BP),  is  linearly  dependent  on  the  occurrences  of  given  groups: 


K  —  f{MV,  BP)  —  Cq  +Y,i  c-i,i  Equation  (3) 


where  m  is  the  number  of  occurrences  of  a  given  group  within  a  molecular  structure,  cij  a 
weighting  factor  for  that  group  that  is  determined  by  a  regression  of  data,  co  is  a  constant 
factor  also  determined  from  a  regression  fit  of  data.  Because  the  algorithm  is  proprietary 
to  ACD  Labs,  the  form  of  the  function  relating  the  boiling  point  to  K  could  not  be  found. 


Eor  vapor  pressure,  the  most  reliable  method  within  EPI  Suite  is  the 
modified  Grain  method  shown  in  Lyman,  which  relates  vapor  pressure  to  the  boiling  point 
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as  calculated  above  via  a  simple  equation.  This  equation  calculates  vapor  pressure  from 
both  solid  and  liquid  materials. 

2.1.2  Calculations  Methods  for  the  Octanol/Water  Partition  Coefficient 

For  this  survey  of  predictive  tools,  a  total  of  five  software  packages  were 
available,  including  EPI  Suite,  ACD  Labs,  ChemAxon’s  Marvin,  Vega,  and  ADF 
COSMO-RS.  EPI  Suite’s  KOWWIN  is  a  QSAR  based  model  with  two  regressions. 
First,  the  molecule  is  divided  into  fragments  based  on  core,  non-hydrogen  atoms.  For  1 120 
different  compounds  with  good  experimentally  determined  Kow,  an  initial  regression  yields 
weighting  eoefficients  for  the  different  fragments.  A  residue  of  errors  remain,  so 
eorrection  faetors  are  determined  from  more  detailed  grouping  of  the  molecular  fragments, 
taking  into  account  structures  such  as  rings  or  specific  functional  groups.  The  weighting 
for  the  eorreetion  faetors  are  determined  by  a  seeond  regression  with  the  full  set  of  2447 
compounds  with  experimentally  determined  partition  coeffieients.  The  model  within  Vega 
is  based  on  the  same  approach.  The  deseription  of  ACD  Labs  Log(P)  algorithm  in  the 
User  Guide  is  similar  in  that  it  is  a  QSAR  based  approach,  but  correction  factors  are  not 
supplied.  This  algorithm  assigns  molecular  fragments  based  on  an  internal  database  of  500 
different  functional  groups  as  well  as  inerements  for  different  hybridizations  of  earbon 
atoms.  Additional  inerements  for  2000  intramoleeular  interaetions  sueh  as  ring  structures 
and  proximity  to  given  functional  groups  are  included.  A  relatively  large  training  set  of 
18,412  chemieal  compounds  is  used.  Chem  Axon’s  log(P)  (Kow)  caleulator  is  QSAR 
based,  utilizing  a  regression  molecular  fragment  deseriptors  approach  as  described  in 
Viswandhan.^^  This  approach  is  augmented  by  including  atomic  partial  charges,  electron 
delocalization,  ionic  forms,  and  molecular  polarizability.  The  model  is  also  refined  by 
additional  molecular  fragments.  The  model  used  in  the  Vega  tool  relies  on  the  same 
approach  by  Viswandhan,  and  uses  a  training  set  of  2524  compounds. 

2.1.3  Approaches  for  Calculating  Water  Solubilities 

Two  separate  methods  are  available  within  EPI  Suite.  One  method  uses 
either  an  experimental  or  estimated  Kow  to  generate  an  estimate,  and  the  second  method 
uses  the  fragment  based  approaeh  to  get  its  estimate.  In  the  first  approach,  an  equation 
relating  the  water  solubility  to  the  Kow  and  a  fragment  determined  correction  factor  in 
Meylan  &  Howard"^**  is  used.  If  a  reliable  melting  point  temperature  is  available,  then  a 
second,  similar  equation  is  used  that  includes  that  quantity.  The  correction  factors  depend 
on  the  appearance  of  15  different  ehemical  functional  groups,  and  a  dataset  of  1450 
measurements  of  Kow  was  used  in  the  regression.  The  results  of  both  approaehes  are 
shown  here.  ACD  Labs  referenees  Meylan  &  Howard  for  its  method  as  well,  but  no 
additional  information  could  be  loeated  in  the  documentation. 

2.1.4  Approaches  for  Calculating  pKa 

As  described  in  the  software  doeumentation,  ACD  Labs  uses  a  fragment 
approaeh  for  caleulating  pKa,  and  it  relies  on  the  presenee  of  heteroatoms  in  the 
hydroearbon  strueture  for  estimation.  Hammett"^'  first  observed  a  simple  equation  to 
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describe  the  dependence  of  the  pKa  of  substituted  compounds  to  the  pKa  of  the 
unsubstituted  compound.  ACD  Labs  uses  parameterized  Hammett-type  equations  to 
describe  1500  possible  combinations  of  more  than  650  ionizable  functional  groups.  The 
change  in  pKa  encountered  when  substituting  different  functional  groups  on  the  original 
ionizable  fragment  is  encapsulated  within  the  electronic  substitutent  constant,  which  is 
determined  for  over  1200  possible  substituents.  Additional  corrections  are  added  to 
account  for  the  effect  of  distance  through  the  hydrocarbon  backbone  between  the  ionizable 
center  and  a  substituent,  as  well  as  through  aliphatic  and  aromatic  rings.  ChemAxon’s 
Marvin  pKa  calculator  uses  a  fragment  based  approach  as  well,"^^'"^^  where  pKa  depends 
linearly  on  fragment  partial  charge  increments,  polarizability  increments,  and  structure 
specific  increments,  such  as  rings.  Predictions  are  based  on  a  regression  relating  these 
increments  to  the  experimental  data.  Information  on  the  data  training  set  could  not  be 
located  for  the  Marvin  pKa  prediction  tool. 

2.2  Principle  of  Operation  of  COSMO-RS 

The  properties  (boiling  point,  vapor  pressure,  Kow,  etc.)  examined  in  this 
report  are  a  reflection  of  the  molecular  solvation  properties.  For  example,  Kow  reflects  the 
different  energetics  of  solvation  for  a  solute  molecule  in  water  and  in  octanol.  Calculation 
of  these  energetics  using  an  electronic  structure  method  for  both  the  solute  molecule  and 
sufficient  solvent  molecules  to  simulate  solvation  is  still  prohibitively  computationally 
expensive.  Fortunately,  polarizable  continuum  models  for  solvents  are  proven  to  be 
reliable  yet  computationally  tractable.  As  a  result,  it  is  possible  to  calculate  solvation 
properties  of  a  given  molecule  from  the  calculated  chemical  potential  of  the  molecule  in 
solution  and  the  gas  phase..  ADF  COSMO-RS  relies  on  the  difference  between  the  charge 
density  on  the  molecular  surface  in  vacuum  and  the  charge  density  of  the  molecular  surface 
within  a  polarizable  continuum  model. 

Polarizable  continuum  models  treat  the  solvent  around  a  solute  molecule  as 
an  infinite  continuum  with  some  dielectric  constant.  A  cavity  is  carved  out  of  this 
continuum  to  make  way  for  the  solute  molecule.  The  solute  molecule  has  some 
distribution  of  electrostatic  charge  contributing  to  a  dipole  moment.  The  polarizable 
continuum  responds  to  this  electrostatic  charge  resulting  in  an  image  charge  that  “screens” 
the  charges  in  the  solute  molecule.  While  the  solvent  continuum  will  affect  the  electronic 
structure  of  the  solvent  molecule,  the  electronic  structure  can  be  refined  until  self 
consistency  is  obtained  with  the  solvent  continuum.  There  is  an  energy  associated  with 
this  screening  charge,  and  that  screening  charge  serves  as  a  descriptor  for  models  of 
boiling  point,  vapor  pressure,  solubility,  pKa,  miscibility,  etc.  Because  the  molecular 
descriptor  is  the  result  of  a  quantum  mechanical  calculation  of  a  molecule,  rather  than  from 
a  group  contribution  method,  we  expect  the  approach  to  be  more  robust  towards  molecules 
containing  groups  not  included  in  the  original  training  set.^^'^^ 

For  COSMO-RS,  the  cavity  geometry  is  related  to  the  radii  of  the  atoms  in 
the  solute  molecule.  COSMO-RS  is  not  a  first  principles  method,  because  the  atomic  radii 
are  fitted  by  a  regression  of  642  data  points  for  a  variety  of  properties,  such  as  partition 
coefficients,  and  vapor  pressure.  The  result  is  approximately  120%  of  the  van  der  Waals 
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atomic  radii.  In  the  respect  that  certain  degrees  of  freedom  are  fixed  by  a  regression  fit  to 
experimental  data,  the  COSMO-RS  method  some  similarity  to  existing  QSAR  methods.  A 
key  difference  is  that  instead  of  relying  on  the  structure  alone  to  make  predictions, 
COSMO-RS  calculates  a  moleeular  deseriptor  based  on  the  charge  distribution  on  the 
moleeule  within  a  polarizable  eontinuum  and  the  moleeule  in  vaeuum. 

Aecording  to  the  ADF  COSMO-RS  Tutorial 
(http://www.sem.eom/Doc/Doe2010/CRS/CRSGUI  tutorial/pagedS.htmO,  pKa  estimates 
can  be  made  from  four  ADF  calculations:  1)  DFT  gas-phase  geometry  optimization  of  the 
compound  of  interest,  2)  COSMO-RS  calculation  of  this  optimized  structure  in  an  implicit 
water  polarizable  continuum  solvent,  3)  DFT  gas-phase  geometry  optimization  of  the 
conjugate  aeid  or  base  of  the  eompound  of  interest,  and  4)  COSMO-RS  ealeulation  of  the 
optimized  eonjugate  strueture  in  the  eontinuum  solvent.  For  multiprotic  eompounds,  this 
four-step  process  can  be  repeated  to  individually  calculate  the  pKa  of  each  protonatable 
site,  although  the  COSMO-RS  parameterization  was  limited  to  monoprotic  molecules  only, 
so  multiprotic  results  are  expeeted  to  be  inaeeurate  due  to  the  large  eharges  of  the 
unparameterized  ions. 

At  T  =  298.15  K,  ADF  COSMO-RS  uses  the  following  equations  when  the  compound  of 
interest  is  an  acid  (deprotonation  of  acid  ->  conjugate  base): 

HA  (aq,lM)  +  H2O  (J)  H^O'^  (aq)  +  A~  (aq)  Equation  (4) 

pKa  —  0.62  *  0.733  *  AG  *  mol/kcal  +  2.10  Equation  (5) 

AG  —  G(conjugate_base)  -  G(acid)  +  G (hydronium)  -  G (water) 

Equation  (6) 

ADE  COSMO-RS  uses  the  following  equations  when  the  compound  is  a  base 
(deprotonation  of  conjugate  aeid  ->  base): 

HB^  (aq,lM)  +  H2O  (1)  (aq)  +  B  (aq)  Equation  (7) 

pKa  =  0.67  *  0.733  *  AG  *  mol/kcal  —  2.00  Equation  (8) 

AG  =  G(hase)  —  G(conjugate_acid)  -f  GQiydronium)  —  G (water) 

Equation  (9) 

Where, 

G(hydronium)  =  -310.737  kcal/mol  (ADF  COSMO-RS  ealeulation  result) 

G(water)  =  -332.353  keal/mol  (ADF  COSMO-RS  ealeulation  result) 

The  ADF  COSMO-RS  calculations  output  the  Gibbs  free  energy  (G)  values. 
Using  the  equations  above  and  the  G  values  obtained  from  ADF  COSMO-RS,  pKa  values 
were  calculated  for  eaeh  compound  and  conjugate  pair.  The  acid  set  of  equations  gave  the 
best  (highest)  pKa  values,  even  when  the  eompound  of  interest  was  a  base.  For  example. 
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in  the  case  of  an  amine  species,  the  ammonium  ion  form  (acid)  was  treated  as  the 
compound  of  interest,  the  amine  form  (base)  was  treated  as  the  conjugate,  and  the  acid 
parameter  values  above  were  used  to  calculate  pKa.  However,  the  reverse  of  this 
approach,  ie,  treating  the  amine  as  the  compound  and  the  ammonium  as  the  conjugate  and 
using  the  base  parameter  values,  should  have  matched  the  ion  charges  better  (HB^  ->  B 
rather  than  HA  ->  A'),  according  to  the  parameterization.  For  glycerol,  the  middle  OH 
group  had  a  lower  pKa  value  and  was  assumed  to  deprotonate  before  either  of  the  two  end 
OH  groups. 

2.3  Overview  of  Traditional  Chemical  Warfare  Agents  and  Simulants 

Chemical  warfare  agents  were  initially  used  in  the  First  World  War. 
Compounds  arising  from  that  time  period  include  sulfur  mustard  (HD)  and  the  Lewisite 
compounds  (LI,  L2,  and  L3).  Due  to  the  similarities  of  its  physical  properties,  glycerol  is 
typically  used  as  a  simulant  for  mustard  agents.  Compounds  representative  of  the  Second 
World  War  time  period  include  the  nitrogen  mustards,  e.g.  HNl,  and  organophosphate 
nerve  agents  tabun  (GA),  sarin  (GB),  soman  (GD),  and  cyclosarin  (GF).  The  V  series 
agents  (VG  and  VX)  are  representative  of  Cold  War  agents.  Because  of  their  similarity  to 
organophosphate  pesticides,  common  pesticides  can  serve  as  simulants  for  the  G  and  V 
agents,  such  as  DMMP,  DIMP,  metamidophos,  and  malathion.  These  compounds  were 
considered  to  be  somewhat  representative  of  the  chemical  warfare  agent  threat  and  their 
chemical  structures  were  used  in  the  present  study.  Also,  for  most  of  the  chemical 
properties  of  interest,  e.g.  solubility  and  Kow,  laboratory  data  was  readily  available.^'’ 
Because  the  fate  of  pharmaceuticals  and  dyes  in  the  environment  were  also  of  interest. 
Disperse  Red  9  and  the  cocaine  molecule  were  considered. 

3.  METHODS 

EPI  Suite  version  4.11  (KOWWIN  vL68,  Mpbpwin  vL48,  WaterNT  vl.O, 
and  WSKOWWIN  vL42),  ACD  Labs  version  12.0,  ChemAxon’s  pKa  and  Kow  online 
calculators,  and  the  VEGA  Non- interactive  Client  for  EogP  (K^w)  predictions.  To 
accommodate  calculations  of  large  numbers  of  compounds  in  a  batch  format,  all  predictive 
tools  accept  input  files  containing  the  structures  in  Simplified  Molecular-Input  Line  Entry 
System  (SMIEES)  notation,  or  the  CAS  numbers.  Table  2  shows  the  list  of  traditional 
chemical  warfare  agents  and  simulants,  as  well  as  Disperse  Red  9,  a  smoke  grenade  dye, 
along  with  their  CAS  numbers  and  structure  in  SMILES  format  that  were  used  as  input 
into  the  predictive  tools.  For  ADF  COSMO-RS,  two  quantum  mechanical  DFT 
calculations  were  necessary,  one  calculation  simulating  vacuum,  the  other  simulating  the 
molecule  of  interest  embedded  in  a  polarizable  continuum  (COSMO).  The  Becke  ’88 
Perdew  ’86  (BP86)  functional  with  the  triple  zeta  TZP  basis  set  were  used  for  the  quantum 
mechanical  calculation. 


Table  2:  Compound  Names,  SMIEES  Structures,  and  CAS  Numbers 


Compound 

SMILES  structure 

CAS# 

glycerol 

C(0)(C0)C0 

56-81-5 

8 


HD  (sulfur) 

C(Cl)CsCCCl 

505-60-2 

HNl  (nitrogen 
mustard) 

C(C1)CN(CCC1)CC 

000538- 

07-8 

metamidophos 

C0P(N)(=0)SC 

65960- 

97-6 

malathion 

C(=0)(C(CC(=0)0CC)SP(=S)(0C)0C)0CC 

121-75-5 

DMMP 

C0P(C)(=0)0C 

756-79-6 

DIMP 

C(C)(C)0P(C)(=0)0C(C)C 

169301- 

54-6 

GA 

C(#N)P(=0)(N(C)C)OCC 

77-81-6 

GB 

CP(=0)(F)0C(C)C 

107-44-8 

GD 

C(C)(C)(C)C(C)OP(C)(=0)F 

96-64-0 

GF 

C1(0P(C)(=0)F)CCCCC1 

329-99-7 

El 

C(=CCl)[As](Cl)Cl 

541-25-3 

L2 

C(=CCl)[As](Cl)C=CCl 

40334- 

69-8 

L3 

C(=CCl)[As](C=CCl)C=CCl 

40334- 

70-1 

VX 

C(C)(C)N(C(C)C)CCSP(C)(=0)OCC 

50782- 

69-9 

VG 

C(CN(CC)CC)SP(=0)(OCC)OCC 

78-53-5 

Disperse  Red  9 

0=C2c  1  ccccc  1  C(=0)c3c2cccc3NC 

82-38-2 

cocaine 

CNl  [C@H]2CC[C@@H]  1  [C@H]([C@H](C2)0C(=0)C 
3=CC=CC=C3)C(=0)0C 

50-36-2 

4.  RESULTS 

4.1  Boiling  Point 

Table  3:  Table  of  Predicted  Boiling  Points  and  Experimental"^"^'"^^  Measurements  in  degrees 
Celsius. 


Compound 

EPI 

ACD 

COSMO-RS 

Experiment 

glycerol 

231 

290 

325 

290 

HD  (sulfur) 

210 

216 

265 

216 

HN 1  (nitrogen  mustard) 

212 

136 

301 

194 

metamidophos 

223 

209 

324 

high* 

malathion 

351 

385 

701 

high* 

Dimethylmethylphosphonate 

152 

181 

283 

181 

diisopropyl  methyl  phosphonate 

210 

214 

394 

high* 

GA 

267 

240 

324 

240 

GB 

140 

147 

218 

147 
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GD 

183 

201 

306 

198 

GE 

223 

237 

278 

239 

El 

156 

203 

215 

196.6 

E2 

204 

229 

290 

n/a 

E3 

247 

215 

318 

n/a 

VX 

321 

319 

550 

298 

VG 

337 

315 

613 

vacuum* 

disperse  red-9 

397 

463 

382 

n/a 

Cocaine 

363 

395 

505 

n/a 

RMSE 

29 

20 

107 

*Measurement  performed  under  vaeuum  rather  than  101  kPa  or  760  mm/Hg,  and  as  a  result  ean’t  be 
eompared  to  the  predietion. 

n/a.  Experimental  not  available  due  to  laek  of  measurement  or  deeomposition  or  sublimation  upon  heating. 


Table  3  compares  predicted  boiling  points  of  the  set  of  chemicals  to 
available  experimental  measurements.  For  some  experimental  measurements,  either  the 
compounds  of  interest  decomposed  before  reaching  the  boiling  point,  or  the  measurements 
were  performed  under  vacuum  and  the  boiling  point  was  evaluated  at  low  pressure  (-0.2 
mm/Hg).  As  a  result,  experimental  measurements  were  unavailable  for  a  number 
compounds,  such  as  metamidophos,  malathion,  DIMP,  and  VG.  For  other  compounds, 
such  as  Lewisite  L2  and  Lewisite  L3,  a  measurement  could  not  be  located. 

Using  the  available  experimental  measurements,  it  was  possible  to  quantify 
the  percent  error  of  the  predictions  relative  to  the  available  experimental  measurements. 
ACD/Labs  method  showed  the  lowest  average  RMSE  error  of  20°  C,  compared  to  that  of 
EPI  Suite  (29  RMSE  error),  and  the  ADE-COSMO-RS  method  (average  error  of  107°  C). 
Eor  glycerol,  HD  (sulfur  mustard),  DMMP,  GA,  GB,  GD,  GE,  and  El,  the  ACD  Eabs 
software  almost  exactly  predicts  the  boiling  point.  ACD  Eabs  overestimates  the  boiling 
point  of  VX  by  7%.  Given  that  ACD  Eabs  takes  into  account  the  nonlinear  dependence  of 
the  boiling  point  on  number  of  occurrences  of  molecular  fragments/groups,  it  does  not 
seem  surprising  that  it  robustly  predicts  boiling  points  for  the  tradtional  agents.  Given  the 
accuracy  of  ACD  Eabs  predictions  for  the  traditional  agents,  we  conclude  the  traditional 
agents  lie  within  the  span  of  the  models  used  by  ACD  Eabs.  EPI  Suite  also  performs  well, 
where  the  greatest  error,  underestimating  the  boiling  point  by  20%,  is  for  Eewisite  EL 
This  is  to  be  expected  given  that  the  Lewisites  contain  the  semi-metal  Arsenic,  which  is 
outside  the  training  set  of  EPI  Suite. 

The  predictions  obtained  from  ADE  COSMO-RS  deviated  from  the 
experimental  measurements  the  most.  Eor  compounds  containing  a  tertiary  amine  such  as 
VX  or  nitrogen  mustard  (HNl),  the  predictions  overestimated  by  more  than  100°  C. 
Predictions  for  the  organophosphates  also  differed  from  experiment  by  more  than  100°  C, 
except  for  GA  and  GE,  which  differed  from  experiment  by  80°  C  and  40°  C,  respectively. 
Although  ADE  COSMO-RS  is  not  a  traditional  QSAR  in  the  sense  that  a  regression  is 
performed  to  relate  the  occurrences  of  some  descriptor  to  the  property  of  interest,  there  is 
still  an  empirical  fit  of  the  atomic  radii  used  to  define  the  molecular  cavity  within  the 
polarizable  continuum.  It  is  quite  possible  the  training  set  used  in  the  regression  is 
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insufficient  and  the  traditional  agents  are  outside  of  the  domain  of  applieability.  On  the 
other  hand,  it  is  possible  the  degrees  of  freedom  afforded  by  the  atomie  radii  fit  do  not 
result  in  a  suffieiently  expansive  applieability  domain.  Lastly,  although  Disperse  Red  9 
and  coeaine  have  secondary  amine  funetional  groups  and  ean  exist  in  a  salt  form,  neither 
EPI  Suite  nor  ACD  Labs  ean  make  predietions  for  the  ionie  form. 


4.2  Vapor  Pressure 


Table  4:  Vapor  Pressures  in  Pa  at  25“  C 


EPI 

ACD 

COSMO-RS 

Experiment 

glyeerol 

l.lE-02 

<1.0E-01 

l.OE-03 

2.2E-02 

HD  (sulfur) 

2.1E+01 

2.7E+01 

3.0E+01 

1.5E+01 

HN 1  (nitrogen  mustard) 

2.6E+01 

l.lE+03 

1.3E+01 

3.3E+01 

metamidophos 

9.1E+00 

2.7E+01 

2.6E-02 

4.7E-03 

malathion 

1.7E-02 

<1.0E-01 

O.OE+00 

4.5E-04 

Dimethylmethylphosphonate 

1.2E+02 

1.3E+02 

1.3E+01 

1.3E+02 

diisopropyl  methyl 
phosphonate 

3.0E+01 

2.7E+01 

4.6E-01 

3.7E+00 

GA 

6.2E+00 

5.3E+00 

3.3E+00 

9.3E+00 

GB 

6.1E+02 

8.0E+02 

1.4E+02 

3.8E+02 

GD 

5.3E+01 

6.7E+01 

9.2E+00 

5.3E+01 

GF 

6.5E+00 

9.3E+00 

2.3E+01 

5.9E+00 

LI 

7.9E+01 

5.3E+01 

2.0E+02 

7.7E+01 

L2 

3.9E+01 

1.3E+01 

1.6E+01 

L3 

3.6E+00 

2.7E+01 

8.8E+00 

VX 

2.9E-01 

<1.0E-01 

4.0E-03 

9.3E-02 

VG 

3.6E-02 

<1.0E-01 

5.0E-03 

3.5E-02 

disperse  Red-9 

4.1E-05 

<1.0E-01 

7.1E-01 

9.3E-07 

eoeaine 

1.7E-03 

<1.0E-01 

1.2E-02 

3.9E-05 

The  values  within  Table  4  eompare  vapor  pressure  predietions  to 
experimental  values.  Unlike  the  predietions  for  boiling  point,  the  deviations  from 
measurements  vary  from  faetors  of  2  to  many  orders  of  magnitude.  For  most  of  the 
predietions,  the  large  magnitude  of  error  is  not  signifieant.  This  is  espeeially  true  when 
both  the  predietion  and  the  measurement  are  very  small,  as  in  the  eomparison  between 
predietion  and  experiment  for  glyeerol,  malathion,  VG,  and  Disperse  Red  9.  Many  of  the 
predietions  for  all  three  tools  are  reasonably  elose  to  the  experimental  measurement,  sueh 
as  for  HD  (all  within  a  faetor  of  2  or  less),  HNl,  GA,  GB,  and  LI.  On  the  other  hand, 
some  of  the  deviations  between  predietion  and  measurement  are  signifieant,  and  eould 
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drastically  affect  the  output  of  fate  models  using  the  predicted  vapor  pressures.  The  most 
signifieant  deviation  on  the  order  of  two  to  four  orders  of  magnitude  oeeured  for  both  EPI 
Suite  and  ACD  Labs  with  metamidophos.  For  EPI  Suite,  the  -NH2  fragment  in 
metamidophos  is  treated  the  same  as  a  typical  amine  group,  and  the  P-N  linkage  is  not 
reeognized.  Most  likely,  the  amine  group  in  metamidophos  is  mueh  different  than  a 
typieal  amine.  For  GA,  although  there  is  an  N-P  linkage,  it  is  possible  the  effeet  of  that 
linkage  is  limited  by  the  faet  that  the  nitrogen  atom  is  joined  to  two  ethyl  groups.  A  fate 
model  may  prediet  rapid  dissipation  for  this  compound  using  the  predieted  vapor  pressures, 
when  in  faet  they  are  persistent.  There  is  also  a  limit  to  the  predietive  range  of  ACD  Labs. 
A  vapor  pressure  lower  than  0.1  Pa  is  outside  of  the  predietive  range  of  ACD  Labs.  Thus, 
ACD  Labs  ean  provide  a  lower  bound  for  vapor  pressure,  but  only  for  vapor  pressures 
higher  than  0.1  Pa. 

ADF  COSMO-RS  performed  reasonable  well  against  all  eompounds  in  the 
set.  Exeept  for  DMMP  and  DIMP,  the  predictions  were  well  within  an  order  of  magnitude 
of  the  measurements.  Measurements  for  low  vapor  pressure  compounds  such  as  VX,  VG, 
and  disperse  Red  9,  were  orders  of  magnitude  different  from  the  predietions,  but  for  these 
eompounds,  both  the  predietion  and  measurement  were  very  small,  impaeting  a  fate  model 
minimally.  Given  that  for  several  organophosphorous  eompounds,  ADF  COSMO-RS 
predictions  matched  experiment  well  (G  and  V  agents,  and  the  pesticides  malathion  and 
metamidophos),  it  is  surprising  that  this  method  should  encounter  difficulty  with  DIMP 
and  DMMP  where  the  predietion  varies  by  more  than  a  faetor  of  ten  with  respeet  to  the 
experimental  measurement.  It  should  be  noted  that  DMMP  and  DIMP  are  very  symmetric 
molecules.  As  a  result,  we  expeet  polarity  and  polarizability  to  depend  strongly  on  the 
moleeular  eonformation.  For  ADF  COSMO-RS,  properties  are  ealculated  at  the  lowest 
energy  eonformation,  where  at  typieal  laboratory  ambient  temperatures,  the  moleeules  may 
be  sampling  many  more  conformations.  As  a  result,  predictions  of  properties  that  depend 
on  polarity  and  polarizability  may  differ  greatly  from  laboratory  measurements.  At  the 
same  time,  ADF  COSMO-RS  does  not  seem  to  deviate  mueh  more  than  an  order  of 
magnitude  in  its  vapor  pressure  estimate,  where  for  a  number  of  instances,  both  EPI  Suite 
and  ACD  Labs  deviate  for  up  to  4  orders  of  magnitude  in  vapor  pressure.  Deviations  of 
that  magnitude  ean  eause  serious  errors  in  predictions  of  the  fate  and  transport  of  these 
materials  in  the  environment. 

4.3  Log  Water/Oetanol  Partition  coeffieient  (Kow) 


Table  5;  Comparison  of  Kow  (-log(P))  predietions  from  EPI  Suite,  ACD  Labs,  Marvin 
Kow  Caleulator,  Vega,  COSMO-RS,  and  experimental  values  for  the  seleeted  traditional 
ehemical  warfare  agents  and  simulants. 


Compound 

EPI 

ACD 

MARVIN 

Vega 

COSMO 

EXP 

glyeerol 

-1.7 

-1.9 

-1.8 

-2.3 

-1.7 

-1.76 

HD  (sulfur) 

2.4 

2.1 

2.0 

3.2 

2.8 

1.37 

HN 1  (nitrogen  mustard) 

1.4 

1.4 

1.9 

2.0 

3.6 

2.02 

metamidophos 

-0.9 

-0.8 

-0.3 

-0.9 

0.1 

-0.8 

malathion 

2.3 

2.4 

1.5 

3.0 

3.6 

2.36 
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Dimethylmethylphosphonate 

-0.6 

-0.9 

-0.1 

-0.6 

-0.5 

-0.61 

diisopropyl  methyl 
phosphonate 

1.2 

0.8 

1.4 

1.2 

1.4 

1.03 

GA 

0.3 

0.1 

-0.1 

-0.2 

1.6 

0.38 

GB 

0.3 

0.5 

0.8 

-0.2 

1.1 

0.3 

GD 

1.7 

1.8 

2.1 

1.2 

2.3 

1.78 

GF 

1.6 

1.6 

1.7 

1.2 

2.2 

na 

El 

2.6 

n/a 

2.5 

1.9 

2.7 

2.56 

E2 

3.5 

n/a 

3.5 

3.2 

3.5 

E3 

4.5 

n/a 

3.7 

4.5 

4.5 

VX 

2.1 

2.1 

2.0 

2.1 

4.2 

2.09 

VG 

1.7 

2.9 

1.8 

1.7 

4.8 

1.7 

disperse  red-9 

4.1 

3.0 

3.0 

3.0 

3.6 

4.1 

cocaine 

2.2 

2.3 

2.3 

2.5 

3.9 

2.3 

RMSE 

0.3 

0.5 

0.5 

0.7 

1.3 

Table  5  shows  a  comparison  of  the  predictions  of  octanol/water  partition 
coefficients  from  various  available  QSAR  tools  compared  to  reported  laboratory 
measurements  in  -log  units.  Laboratory  measurements  could  not  be  found  by  the  authors 
for  cyclosarin  (GF)  or  two  of  the  Lewisite  agents  L2  and  L3.  When  taking  the  square  root 
of  the  average  of  the  squares  of  differences  between  the  predictions  and  the  experimental 
measurements  in  -log  units,  one  is  able  to  perform  a  rough  ranking  of  the  different  tools. 
EPI  Suite  had  the  lowest  average  error  (in  -log  units)  of  0.33.  ChemAxon’s  Marvin  - 
log(P)  Calculator  ranked  next  with  an  average  error  of  0.50.  The  average  error  for  the 
predictions  from  ACD  Labs  was  close  with  a  value  of  0.54.  The  average  error  for  the 
Vega  -log(P)  Java  based  calculator  was  0.7.  The  largest  deviation  from  experimental 
value  is  obtained  between  the  predictions  from  ADF  COSMO-RS  for  this  particular  set  of 
compounds. 


For  EPI  Suite,  ACD  Labs,  Marvin,  and  Vega,  sulfur  mustard  (HD)  yields 
the  largest  error,  followed  by  the  prediction  for  nitrogen  mustard  (HNl)  for  EPI  Suite. 
Examining  the  EPI  Suite  output,  it  appears  only  basic  atomic  fragments  of  Cl,  thio-ether, 
and  methylene  carbon  groups  are  recognized.  The  proximity  of  the  chlorine  atoms  to  the 
sulfur  are  not  accounted  for  in  the  available  model.  The  error  for  HNl  is  a  factor  of  3  less, 
and  an  additional  correction  factor  for  the  ClCCNCCCl  fragment  appears.  Clearly,  the 
correction  factors  arising  from  the  second  regression  of  data  and  deviations  from  data  from 
the  initial  QSAR  prediction  in  EPI  Suite’s  KOWWIN  tool  is  quite  effective.  Another  point 
to  make  is  the  molecular  symmetry  of  HD  and  HN 1 .  The  symmetry  is  not  recognized  in 
the  fragment  output  of  EPI  Suite,  or  any  of  the  other  fragment  based  methods,  where  the 
fragment  values  are  blindly  summed  into  the  property  prediction.  Kow  should  be  affected 
by  molecular  polarity,  and  the  contribution  of  polar  groups  to  the  molecular  polarity  can  be 
cancelled  out  by  symmetry.  As  a  result,  a  failure  to  account  for  molecular  symmetry  may 
result  in  inaccuracy. 
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ChemAxon’s  Marvin  Log(P)  Calculator  also  uses  additional  correction 
factors  derived  from  molecular  descriptors  such  as  atomic  partial  charges  and 
polarizability.  HD  and  HNl  also  exhibit  error  for  ACD  Labs,  but  this  error  is  less  than  the 
predictions  from  EPI  Suite  (0.593  and  0.36,  respectively).  The  reduced  error  for  these  two 
compounds  could  be  attributed  to  the  large  training  set  used  for  the  ACD  Labs  algorithm. 
However,  for  VG  and  Disperse  Red  9,  errors  greater  than  a  log  unit  are  obtained,  so  that 
for  the  limited  set  of  compounds  considered,  ACD  Labs  exhibits  slightly  more  error  than 
EPI  Suite  and  ACD  Labs.  Although  Vega  is  based  on  the  same  approach  as  that  used  in 
EPI  Suite’s  KOWWIN  program,  the  average  error  exhibited  is  greater  than  the  predictions 
from  EPI  Suite.  An  examination  of  the  greatest  errors  show  HD  gives  the  largest 
magnitude  error  on  the  order  of  nearly  two  log  units,  and  Disperse  Red  9  an  error  of  1  log 
unit.  The  output  of  Vega  does  indicate  that  HD  and  Disperse  Red  9  are  well  outside  of  the 
applicability  range.  Interestingly,  the  prediction  from  Vega  is  identical  to  the  output  from 
EPI  Suite  with  the  correction  factor  of  1. 1 000  subtracted.  Most  likely  the  difference 
between  Vega  and  EPI  Suite  is  the  implementation  of  the  correction  factors.  The  greatest 
error  is  consistently  obtained  with  the  predictions  of  ADE  COSMO-RS  on  the  limited  data 
set  studied  here.  We  conjecture  the  error  can  be  attributed  to  the  fact  that  empirical 
regression  of  the  ADE  COSMO-RS  method  is  performed  on  the  COSMO  atomic  radii 
rather  than  quantum  mechanically  calculated  molecular  descriptors,  such  as  charge 
distribution,  HOMO-LUMO  difference,  etc. 

4.4  Water  Solubility 

The  predictions  from  EPI  Suite,  ACD  Labs,  and  ADE/COSMO  are 
compared  to  experimental  values  in  Table  6.  To  estimate  error,  the  root  mean  squares  of 
the  difference  between  the  log  of  the  predicted  solubility  and  the  log  of  the  experimental 
solubility  were  used  to  rank  the  different  methods.  Both  of  the  methods  within  EPI  Suite 
(Kow  and  fragment  based)  and  ACD  Labs  had  nearly  equivalent  errors  of  0.87,  1.1,  and  1.0, 
respectively,  although  the  method  within  EPI  Suite  that  relied  on  appeared  to  have  the 
least  error.  ADE  COSMO-RS  had  a  slightly  larger  error  with  an  average  of  the  difference 
between  the  logarithm  of  the  prediction  and  the  logarithm  of  the  measurement  of  1.3. 

Eor  EPI  Suite’s  K^w  method,  the  four  compounds  that  result  in  the  greatest 
deviation  from  experiment  is  HNl,  sarin  (GB),  soman  (GD),  and  VX.  Eor  the  greatest 
magnitude  error,  with  HNl,  solubility  was  overestimated  by  nearly  six  orders  of 
magnitude.  Eor  triethylamine,  an  analog  of  HNl  without  the  two  chlorine  atoms,  EPI 
Suite’s  prediction  of  6.82  x  I mg/L  is  very  close  to  the  experimental  measurement  of 
6.86  X  lO"^  mg/L."^^  Clearly,  the  chlorine  atoms  in  HNl  have  a  drastic  effect  on  solubility 
that  is  not  accounted  for  in  EPI  Suite’s  model.  This  is  surprising  since  the  prediction  of 
Kow  from  EPI  Suite  is  less  than  0.5  log  units  different  than  experiment.  One  thing  to  be 
noted  in  the  EPI  Suite  output  is  that  only  one  correction  factor  is  applied  due  to  the 
aliphatic  amine  group.  No  correction  factors  for  chlorine  are  included.  Another  issue  to 
mention  is  that  molecular  symmetry  can  play  a  role.  Although  chlorine  atoms  within 
hydrocarbon  molecules  tend  to  be  locally  polar,  if  the  arrangement  of  chlorine  atoms  is 
symmetric,  that  polarity  may  cancel  out  resulting  in  a  nonpolar  molecule  overall.  In 
contrast  to  HNl,  the  GB,  GD,  and  VX  solubilities  are  underestimated  by  about  an  order  of 
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magnitude.  Within  the  output,  it  appears  no  eorreetion  faetors  are  available  for  the  unique 
struetures  around  the  phosphorus  in  the  G  and  V  agents,  and  so  the  eloseness  between 
measurement  and  predietion  for  GA  may  be  fortuitous. 

For  the  fragment  based  method  in  EPI  Suite,  the  method  seems  to 
eonsistently  overestimate  solubility  in  water.  DIMP,  GF,  Disperse  Red  9,  and  HNl  yield 
the  greatest  differenees  between  experiment  and  predietion.  As  with  the  Kow  method  in 
EPI  Suite,  the  strueture  around  phosphorus  atom  seems  to  be  unique  for  GF.  Although  all 
portions  of  DIMP  appear  to  be  represented  in  the  ehosen  fragments  for  property 
estimation,  the  predieted  solubility  for  DIMP  is  still  orders  of  magnitude  greater  than  the 
experiment.  It  is  possible  that  the  diserepaney  arises  from  the  faet  that  the  methyl  group 
attaehed  to  the  phosphorus  is  eounted  in  the  same  way  as  the  methyl  and  methylene  groups 
in  isopropyl  fragments.  However,  there  is  a  similar  methyl  group  in  GB,  and  both 
experiment  and  predietion  show  GB  highly  soluble  in  water.  Moleeular  symmetry  may  be 
an  explanation  for  less  than  expeeted  water  solubility,  where  symmetry  eaneels  out 
molecular  polarity.  For  GF,  the  large  difference  between  prediction  and  experiment  is  not 
clear  either.  All  portions  in  the  GF  molecule  also  have  corresponding  fragments  in  the 
model,  yet  there  is  poor  agreement.  At  first  guess,  the  unsaturated  ring  would  reduce  the 
solubility,  but  those  aliphatic  carbons  are  accounted  for  within  the  model.  An  evaluation 
of  the  solubility  of  cyclohexanol  shows  good  agreement  between  experiment  and 
prediction  (4.2  x  lO"^  vs.  4.7  x  10"^  mg/F,  respectively).  Again,  the  substitutents  around  the 
phosphorus  atom  are  not  unique,  and  molecular  symmetry  is  not  an  issue.  Perhaps  there  is 
some  interaction  between  the  aliphatic  carbons  and  the  substituents  around  the  phosphorus 
atom.  The  predictions  for  Disperse  Red  9  and  HNl  also  overestimate  the  solubility  by 
about  an  order  of  magnitude.  Both  of  these  molecules  have  few  polar  or  ionizable 
functional  groups.  Molecular  symmetry  that  cancels  the  effect  of  polar  fragments  may 
reduce  the  overall  polarity  of  the  molecules  in  water,  leading  to  errors  in  the  prediction. 

For  the  ACD  Fabs  predictions.  Disperse  Red  9,  HNl,  DIMP,  and  GF  also 
had  the  greatest  differences  between  prediction  and  measurement.  However,  unlike  EPI 
Suite,  the  version  of  ACD  Fabs  we  possess  did  not  report  the  details  of  the  fragment 
contribution.  Since  ACD  Fabs  uses  the  same  approach  as  the  EPI  Suite  Fragment  method, 
we  can  assume  some  of  the  same  issues  affect  this  software  package. 


Table  6:  Water  Solubility  (mg/F) 


Compound 

Solubility  i 

[mg/F) 

EPI  Kow 

EPI  Frag 

ACD 

COSMO-RS 

Experiment 

glycerol 

l.OE+06 

l.OE+06 

7.1E+05 

l.OE+06 

l.OE+06 

HD  (sulfur) 

6.1E+02 

4.3E+02 

4.2E+03 

3.7E+02 

6.8E+02 

HNl  (nitrogen 
mustard) 

4.0E+04 

7.3E+03 

1.6E+04 

1.3E+02 

1.6E+02 

metamidophos 

4.0E+05 

l.OE+06 

l.OE+06 

l.OE+06 

l.OE+06 

malathion 

7.8E+01 

4.3E+02 

3.0E+02 

6.6E+01 

1.4E+02 

DMMP 

3.2E+05 

l.OE+06 

6.3E+05 

l.OE+06 

l.OE+06 

DIMP 

7.3E+03 

2.2E+05 

3.4E+04 

l.OE+06 

1.5E+03 
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GA 

3.2E+04 

l.OE+06 

3.5E+05 

1.6E+04 

9.8E+04 

GB 

4.6E+04 

l.OE+06 

4.2E+05 

l.OE+06 

l.OE+06 

GD 

1.6E+03 

3.4E+05 

5.0E+04 

l.lE+04 

2.1E+04 

GF 

2.1E+03 

5.4E+05 

6.2E+04 

1.3E+04 

3.7E+03 

LI 

2.6E+02 

4.7E+03 

n/a 

1.8E+03 

5.0E+02 

L2 

2.9E+01 

5.0E+02 

n/a 

2.0E+02 

L3 

3.3E+00 

5.2E+01 

n/a 

1.9E+01 

VX 

3.2E+03 

9.1E+04 

4.8E+03 

5.0E+02 

3.0E+04 

VG 

6.6E+03 

4.7E+04 

1.6E+04 

l.OE+02 

3.0E+04 

disperse  red-9 

6.8E-01 

8.6E+00 

4.5E-04 

1.7E+02 

1.2E-01 

cocaine 

1.3E+03 

l.OE+03 

5.3E+02 

3.5E+02 

1.8E+03 

RMSE* 

0.87 

1.1 

1.0 

1.3 

*Root  Mean  Square  of  the  differenees  between  log(predieted)  and  log(experiment). 


4.5  pKa  Predietions 

Table  7  eontains  a  eomparison  of  pKa  predietions  from  ACD  Labs, 
ADF/COSMO-RS  and  Marvin.  Unfortunately,  for  many  of  the  eompounds,  we  eould  not 
loeate  most  of  the  pKa  values  for  the  eompounds  exeept  for  glyeerol,  HNl  nitrogen 
mustard,  and  VX.  For  glyeerol,  only  the  first  proton  dissoeiation  eonstant  was  reported  at 
14.15  log  units.  There  was  less  than  0.5  log  units  differenee  between  ACD  Labs,  Marvin, 
and  the  experimental  measurement.  ADF/COSMO  was  two  log  units  off  The  same  was 
true  for  HNl  with  both  Marvin  and  ACD  Labs  less  than  0.2  log  units  off,  and 
ADF/COSMO-RS  was  0.83  log  units  off  For  VX,  the  errors  were  greater,  with 
ADF/COSMO-RS  0.7  log  units  off,  ACD  Labs  1.2  log  units  off,  and  Marvin  2.0  log  units 
off.  Good  agreement  was  determined  for  glycerol,  where  the  R-OH  functional  group  is 
contained  within  the  ACD  Labs  library  of  ionizable  functional  groups.  A  similar  degree  of 
accuracy  is  found  with  Marvin,  although  a  slightly  different  approach  is  used.  ACD  Labs 
applies  an  empirical  correction  factor,  and  perhaps  the  contribution  from  partial  atomic 
charge  and  polarizability  in  Marvin  acts  in  the  same  way  as  the  correction  factor  within 
ACD  Labs.  Although  the  predictions  from  ADF/COSMO-RS  were  somewhat  close  to  the 
experimental  values,  the  difference  between  the  predictions  from  this  method  and 
experimental  results  were  consistently  greater  than  the  results  from  ACD  Labs  or  Marvin. 
It  should  also  be  noted  that  Marvin  was  unable  to  make  a  prediction  for  GA  and 
metamidophos.  Most  likely  it  could  not  recognize  the  N-P  linkage  in  either  GA  or 
metamidophos. 


Table  7;  Comparison  of  pKa  predictions  and  available  experimental  data. 


ACD 

ADF 

COSMO-RS 

Marvin 

EXP 

HI 

H2 

H3 

HI 

H2 

H3 

glycerol 

13.7 

14.8 

15.9 

12.3 

14.4 

21.9 

13.6 

15.2 

14.15 
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HD  (sulfur) 

n/a 

n/a 

HNl  (nitrogen  mustard) 

6.5 

7.4 

6.3 

6.57 

metamidophos 

-0.6 

3.2 

na 

malathion 

n/a 

n/a 

DMMP 

n/a 

n/a 

diisopropyl  methyl 
phosphonate 

n/a 

n/a 

GA 

-4.7 

-2.4 

n/a 

GB 

n/a 

n/a 

GD 

n/a 

n/a 

GF 

n/a 

n/a 

LI 

n/a 

n/a 

L2 

n/a 

L3 

n/a 

VX 

9.8 

7.9 

10.6 

8.6 

9.4 

VG 

9.4 

6.2 

9.9 

disperse  red-9 

2.3 

2.3 

2.2 

cocaine 

9.0 

9.9 

8.9 

For  most  of  the  compounds,  such  as  HD,  malathion,  DMMP,  DIMP,  GB, 
GD,  GF,  LI,  L2,  and  L3,  a  readily  ionizable  atom  center  was  not  found  within  the 
molecule,  and  a  pKa  could  not  be  determined  for  these  compounds.  The  rest  of  the 
molecules  possessed  either  an  amine  functional  group,  or  an  oxygen  atom  with  a  proton. 
Although  ADF/COSMO-RS  is  not  based  on  the  fragment  QSAR  method,  this  approach  did 
not  determine  a  pKa. 


5.  CONCLUSIONS. 

A  number  of  in-silico  tools  were  used  to  predict  physico-chemical 
properties  that  are  relevant  to  modeling  of  environmental  fate  of  a  number  of  traditional 
agents  and  simulants,  and  these  results  were  compared  to  available  experimental  data. 
Specifically,  the  boiling  point,  vapor  pressure,  log  of  the  water/octanol  partitioning 
coefficient  (Kow),  water  solubility,  and  pKa  were  the  properties  surveyed.  For  boiling 
point,  both  ACD  Labs  and  EPI  Suite  were  highly  accurate  to  within  20°  C  and  29°  C.  EPI 
Suite,  ACD  Eabs,  and  ADE  COSMO-RS  performed  quite  well  for  vapor  pressure 
predictions,  except  that  ACD  Eabs  could  not  generate  predictions  for  vapor  pressures  of 
less  than  0.1  Pa.  Eor  Kow,  EPI  Suite,  ACD  Eabs,  ChemAxon’s  Marvin,  Vega,  and  ADE 
COSMO-RS  were  evaluated.  When  considering  the  RMSE  for  the  Kow  values,  EPI  Suite, 
ACD  Eabs,  Marvin,  Vega,  and  ADE  COSMO-RS  resulted  in  0.3,  0.5,  0.5,  0.7,  and  1.3  log 
units,  respectively.  ACD  Eabs  was  unable  to  give  a  prediction  for  any  of  the  Eewisite 
compounds.  EPI  Suite’s  Kow  estimation  method  proved  to  have  the  smallest  difference 
between  experiment  and  measurement.  When  considering  the  root  mean  square 
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differences  between  the  logs  of  the  predictions  and  measurements,  EPI  Suite’s  Kow 
method,  EPI  Suite’s  Eragment  based  estimation  method,  ACD  Eabs,  and  ADE  COSMO- 
RS  had  values  of  0.87,  1.1,  1.0,  and  1.4,  respectively.  We  hypothesize  that  the  greatest 
difference  between  prediction  and  experimental  measurement  occurred  for  ADE  COSMO- 
RS  because  although  it  is  based  on  DET  calculations,  there  is  still  an  empirical  fit  that  is 
based  on  only  642  compounds.  There  is  not  enough  information  to  comment  on  the 
applicability  domain  for  ADE  COSMO-RS.  Also,  ADE  COSMO-RS  depends  on  a  single 
molecular  descriptor,  i.e.  the  difference  in  charge  density  between  gas  phase  and 
condensed  phase  calculations.  We  would  expect  a  regression  on  a  larger  training  set 
including  additional  molecular  descriptors  to  greatly  improve  performance  of  this 
approach.  Eor  pKa  estimations,  experimental  data  for  only  three  of  the  compounds 
examined  could  be  located,  and  so  the  accuracy  of  ACD  Eabs  and  ChemAxon’s  Marvin 
towards  the  traditional  agents  could  not  be  evaluated. 

We  reason  that  two  issues  contribute  to  the  difference  between  experimental 
measurements  and  estimations  from  fragment  based  methods.  Eirst,  as  would  be  expected, 
compounds  containing  elements  or  functional  groups  outside  of  the  method’s  training  set 
contributed  to  the  average  error,  such  as  the  element  arsenic  in  El.  The  same  issue  is  true 
for  vapor  pressure,  K^w,  and  solubility  calculations,  where  unique  functional  groups  around 
the  phosphorus  atoms,  such  as  the  P-N  chemical  link  to  a  primary  amine  in  metamidophos 
results  in  error.  Eor  a  property  such  as  boiling  point  or  vapor  pressure,  where 
constitutional  descriptors  such  as  the  elemental  constituents  or  molecular  mass,  details  of 
the  molecular  structure  are  not  important.  Eor  properties  involving  behavior  in  water, 
where  molecular  interactions  are  important,  descriptors  such  as  polarity  or  polarizability 
intuitively  also  become  important.  Yet,  the  fragment  based  methods  treat  each  fragment 
the  same  regardless  if  it  is  attached  to  an  aliphatic  carbon  atom  or  a  phosphorus  atom. 

The  second  issue  that  seems  apparent  in  examination  of  the  data  is  that 
molecules  that  tend  to  produce  the  largest  differences  between  model  prediction  and 
experimental  measurement  have  molecular  symmetry.  Eor  example,  HNl  is  highly 
symmetric  around  the  nitrogen  atom,  and  the  carbonyl  groups  in  Disperse  Red  9  are 
arranged  on  opposite  sides  of  a  ring.  Eor  properties  highly  dependent  on  molecular 
structure  and  polarity,  such  as  water  solubility,  a  fragment  based  method  can  contribute 
significantly  to  the  error,  since  the  fragment  contributions  are  treated  additively.  It  is 
possible  symmetry  may  cancel  out  the  contributions  from  a  given  fragment,  so  that  a 
property  is  overestimated,  as  occurs  for  EPI  Suite’s  fragment  method  for  solubility  in 
water  on  HNl,  DIMP,  and  Disperse  Red  9. 

Overall,  EPI  Suite  and  ACD  Eabs  gave  reasonable  property  predictions 
when  the  compound  of  interest  was  contained  within  the  model  applicability  domains  and 
were  asymmetric.  We  project  that  a  model  based  on  quantum  mechanical  descriptors 
would  be  insensitive  to  these  effects  since  1)  any  unique  fragment  or  functional  group 
could  be  “translated”  into  a  more  universal  property  descriptor  such  as  dipole  moment,  and 
2)  symmetry  would  be  automatically  accounted  for. 
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ACRONYMS 


ACD 

Advanced  Chemistry  Development  (Labs) 

ADF 

Amsterdam  Density  Eunctional  Code 

CAS 

Chemieal  Abstracts  Service 

CWA 

Chemical  Warfare  Agents 

COSMO 

COnductor-like  Screening  Model 

DIMP 

diisopropyl  methyl  phosphonate 

DMMP 

dimethyl  methyl  phosphonate 

DFT 

Density  Eunctional  Theory 

EPA 

Environmental  Protection  Agency 

GA 

Tabun 

GB 

Sarin 

GD 

Soman 

GF 

Cyclosarin 

HD 

Sulfur  mustard  agent 

HNl 

Nitrogen  mustard  agent  (C1C2H4)2N(C2H5)) 

HOMO 

Highest  Oceupied  Moleeular  Orbital 

LUMO 

Lowest  Unoeeupied  Molecular  Orbital 

LI 

Lewisite  (ASC2H2CI3) 

L2 

Lewisite  (As(C2H2Cl)2Cl 

L3 

Lewisite  (As(C2H2Cl)3) 

QSAR 

Quantitative  Strueture  Aetivity  Relationship 

RMSE 

Root  Mean  Square  Error 

SMILES 

Simplified  Moleclar-Input  Line-Entry  System 

TIC 

Toxic  Industrial  Chemical 

VG 

Amiton/Tetram,  (C10H24O3PS) 

VX 

Cold  War  Agent,  (C11H26NO2PS) 
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